-
Notifications
You must be signed in to change notification settings - Fork 20.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go.mod: use github.com/holiman/bloomfilter/v2 #22044
Conversation
Running a fast-sync with this PR on https://geth-bench.ethdevops.io/d/Jpk-Be5Wk/dual-geth?orgId=1&from=1608478900273&to=now |
trie/sync_bloom.go
Outdated
m := float64(b.bloom.M()) | ||
|
||
return math.Pow(1.0-math.Exp((-k)*(n+0.5)/(m-1)), k) | ||
func hashToUint64(h common.Hash) uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mixes all the bits in a single uint64, is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the intent, yes. However, it's really not changed from before. The previous code took a []byte
as input, and called
func (f syncBloomHasher) Sum64() uint64 { return binary.BigEndian.Uint64(f) }
And this method here is exactly that same binary.BigEndian.Uint64
method, but adapted for a fixed-size array input.
trie/sync_bloom.go
Outdated
@@ -171,11 +157,11 @@ func (b *SyncBloom) Close() error { | |||
} | |||
|
|||
// Add inserts a new trie node hash into the bloom filter. | |||
func (b *SyncBloom) Add(hash []byte) { | |||
func (b *SyncBloom) Add(hash common.Hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure this []byte
to common.Hash
change is a good idea. The old code throughout either used some existing []byte
value, or if it has a hash, it did hash[:]
. In both these cases, there's no copy involved as we're jut using the same backing memory area.
With this refactor however, you are making the parameter an array from potentially a []byte
, which always incurs a full copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SyncBloom.Add
always takes a common.Hash
without conversion, afaict
@@ -110,12 +97,11 @@ func (b *SyncBloom) init(database ethdb.Iteratee) { | |||
// If the database entry is a trie node, add it to the bloom | |||
key := it.Key() | |||
if len(key) == common.HashLength { | |||
b.bloom.Add(syncBloomHasher(key)) | |||
b.bloom.AddHash(binary.BigEndian.Uint64(key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for calling binary.Bigendian to manually convert instead of common.BytesToHash
like other occurances? Seems a bit dangerous as the two methods could go out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syncBloomHasher
was replaced by it's Sum64
, which is binary.BigEndian.Uint64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point here was that we have an explicit manual calculation and an implicit built-in calculation now, and it seems dangerous to have two means to arrive to the same number, as eventually one will get modified without the other and things will bork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a misunderstanding here.
However, we could keep b.bloom.Add(syncBloomHasher(key))
and redefine syncBloomHasher
as binary.BigEndian.Uint64
?
But I think you are conflating two different things, the built-in calculation has nothing to do with this
b.bloom.Add(syncBloomHasher(hash)) | ||
} else if ok, hash := rawdb.IsCodeKey(key); ok { | ||
// If the database entry is a contract code, add it to the bloom | ||
b.bloom.AddHash(binary.BigEndian.Uint64(hash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for calling binary.Bigendian to manually convert instead of common.BytesToHash
like other occurances? Seems a bit dangerous as the two methods could go out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean the hashToUint64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
* deps: use improved bloom filter implementation * eth/handler, trie: use 4 keys for syncbloom + minor fixes * eth/protocols, trie: revert change on syncbloom method signature
This PR switches bloom filter, from the (archived) steakknife implementation a much much better (faster + less false positives) one.
Two orders of magnitude lower false-positive-rate achived here: #21724 (comment) .